SNPPhenA: a corpus for extracting ranked associations of single-nucleotide polymorphisms and phenotypes from literature

نویسندگان

  • Behrouz Bokharaeian
  • Alberto Díaz
  • Nasrin Taghizadeh
  • Hamidreza Chitsaz
  • Ramyar Chavoshinejad
چکیده

BACKGROUND Single Nucleotide Polymorphisms (SNPs) are among the most important types of genetic variations influencing common diseases and phenotypes. Recently, some corpora and methods have been developed with the purpose of extracting mutations and diseases from texts. However, there is no available corpus, for extracting associations from texts, that is annotated with linguistic-based negation, modality markers, neutral candidates, and confidence level of associations. METHOD In this research, different steps were presented so as to produce the SNPPhenA corpus. They include automatic Named Entity Recognition (NER) followed by the manual annotation of SNP and phenotype names, annotation of the SNP-phenotype associations and their level of confidence, as well as modality markers. Moreover, the produced corpus was annotated with negation scopes and cues as well as neutral candidates that play crucial role as far as negation and the modality phenomenon in relation to extraction tasks. RESULT The agreement between annotators was measured by Cohen's Kappa coefficient where the resulting scores indicated the reliability of the corpus. The Kappa score was 0.79 for annotating the associations and 0.80 for the confidence degree of associations. Further presented were the basic statistics of the annotated features of the corpus in addition to the results of our first experiments related to the extraction of ranked SNP-Phenotype associations. The prepared guideline documents render the corpus more convenient and facile to use. The corpus, guidelines and inter-annotator agreement analysis are available on the website of the corpus: http://nil.fdi.ucm.es/?q=node/639 . CONCLUSION Specifying the confidence degree of SNP-phenotype associations from articles helps identify the strength of associations that could in turn assist genomics scientists in determining phenotypic plasticity and the importance of environmental factors. What is more, our first experiments with the corpus show that linguistic-based confidence alongside other non-linguistic features can be utilized in order to estimate the strength of the observed SNP-phenotype associations. TRIAL REGISTRATION Not Applicable.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Single Nucleotide Polymorphisms and Association Studies: A Few Critical Points

Uncovering DNA sequence variations that correlate with phenotypic changes, e.g., diseases, is the aim of sequence variation studies. Common types sequence variations are Single nucleotide polymorphism (SNP, pronounced snip).SNPs are the third-generation molecular marker. SNP represents a DNA sequence variant of a single base pair with the minor allele occurring in more than 1% of a given popula...

متن کامل

Effect of Single Nucleotide Polymorphisms in IGF-1R Gene on Growth Rate Traits in Makooei Sheep

Insulin-like growth factor 1 receptor (IGF-1R) is a main receptor of IGFs family which plays a critical role in the postnatal growth and skeletal growth in many species. However, there are few reports of IGF-1R gene structure and its effects on growth traits in sheep. The objectives of this study were detection of IGF-1R polymorphisms and assessment of their associations with ...

متن کامل

Association of IGF-I Gene Polymorphisms with Carcass Traits in Iranian Mehraban Sheep Using SSCP Analysis

Molecular genetics selection on individual genes is a promising method to genetically improve economically important traits in livestock. The insulin like growth factor-I (IGF-I) gene may play important roles in growth of multiple tissues, including muscle cells, cartilage and bone. The objectives of the present study were the estimate the haplotype frequencies of the IGF-I gene polymorphisms i...

متن کامل

The Effect of Uncoupling Protein Polymorphisms on Growth, Breeding Value of Growth and Reproductive Traits in the Fars Indigenous Chicken

The avianuncoupling protein (avUCP) is a member of the mitochondrial transporter superfamily that uncouples proton entry in the mitochondrial matrix from ATP synthesis. The polymerase chain reaction restriction fragment length polymorphism (PCR-RFLP) method was used to estimate the allele and genotype frequencies of the UCP/HhaI polymorphisms and to determine associations between these polymorp...

متن کامل

Association study of two single nucleotide polymorphisms rs10757278 and rs1333049 with atherosclerosis, a case-control study from Iraq

Atherosclerosis is one of the most important coronary artery disease (CAD) caused by lipid accumulation, hypertension, smoking, and many other factors such as environmental and genetic factors. It has been recorded that genetic variations in rs10757278 and rs1333049 are correlated with CAD. In the present study, 100 blood samples were collected (50 CAD patients and 50 appeared to be healthy con...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2017